The Company They Keep: Extracting Japanese Neologisms Using Language Patterns
نویسندگان
چکیده
We describe an investigation into the identification and extraction of unrecorded potential lexical items in Japanese text by detecting text passages containing selected language patterns typically associated with such items. We identified a set of suitable patterns, then tested them with two large collections of text drawn from the WWW and Twitter. Samples of the extracted items were evaluated, and it was demonstrated that the approach has considerable potential for identifying terms for later lexicographic analysis.
منابع مشابه
Automated Extraction of Swedish Neologisms using a Temporally
This thesis presents an automated system for extracting neologisms using machine learning approaches. The neologisms are extracted from a large temporally annotated corpus containing newspaper articles and blog posts. We find that our system is different from much of the previous research on neologism extraction and justify these differences by relating it to current research in evolutionary li...
متن کاملMining and Classification of Neologisms in Persian Blogs
The exponential growth of the Persian blogosphere and the increased number of neologisms create a major challenge in NLP applications of Persian blogs. This paper describes a method for extracting and classifying newly constructed words and borrowings from Persian blog posts. The analysis of the occurrence of neologisms across five distinct topic categories points to a correspondence between th...
متن کاملReference Resolution Using Semantic Patterns In Japanese Newspaper Articles
Reference resolution is one of the important tasks in natural language processing. In Japanese newspaper articles, pronouns are not often used as referential expressions for company names, but shortened company names and dousha (“the same company”) are used more often (Muraki et al. 1993). Although there have been studies of reference resolution for various noun phrases in Japanese (Shibata et ...
متن کاملIdentification of Neologisms in Japanese by Corpus Analysis
In Japanese and other languages that do not use spaces or other markers between words, the identification and extraction of neologisms and other unrecorded words presents some particular challenges. In this paper we discuss the problems encountered with neologism identification and describe and discuss some of the methods that have been employed to overcome these problems.
متن کاملSocio-cultural Patterns in Iranian High School Textbooks from the View point of Motivation for Research
Introduction One very important aspect of any textbook is its content in terms of the motivation it creates in the readers. This is specifically true in EFL textbooks where the learners need more than just content since content-wise, such books are not very much different from the learners’ world knowledge level. That is why material developers working in this area are usually consciously choos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017